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Abstract 

Transformational  programming  is  a  methodology  that  intends  to  formalize  the  development 
of  programs  from  problem  specifications.  Given  the  recent  effort  towards  the  design  of  a  common 
prototyping  system  (CPS)  for  the  Ada  programming  language,  transformation  systems  may  be 
reconsidered  as  possible  components  of  prototyping  systems.  This  paper  examines  and  evaluates 
three  approaches  to  transformational  programming: 

•  The  Munich  CIP  project  (Computer-aided,  Intuition-guided  Programming)  consists  of  a 
strongly  typed,  wide-spectrum  language  with  user-defined  algebraic  types  and  a  semi¬ 
automatic  transformation  system  that  requires  user  guidance. 

•  By  contrast,  “Algorithmics,”  the  work  on  algebraic  specification  originating  from  IFIP 
WG  2.1,  is  a  pure  pencil-and-paper  approach  to  transformational  programming.  It  provides 
a  concise,  uniform  mathematical  notation  and  includes  work  on  nondeterminism. 

•  RAPTS  (Robert  A.  Paige’s  Transformation  System)  is  a  fully  mechanical  system  that 
transforms  high-level  specifications  to  C  code.  The  specifications  are  given  in  a  functional 
subset  of  SETL  augmented  with  fixed-point  operations. 

First,  we  describe  each  system  in  detail  and  highlight  interesting  features.  Next,  we  establish 
a  framework  of  common  criteria  by  which  such  different  transformational  systems  can  be  evalu¬ 
ated.  Finally,  we  point  out  common  features  and  differences  of  the  three  systems  and  compare 
them. 
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1  Introduction 


Programming  is  a  complex  task;  large  programs  are  difficult  to  produce,  and  once  produced  they  are 
often  unreliable  and  expensive  to  maintain  or  improve.  The  “software  crisis”  remains  an  important 
issue  in  software  engineering.  A  large  amount  of  research  has  been  going  on  to  overcome  this  crisis: 
On  one  hand,  a  considerable  portion  of  the  work  has  focused  on  formalizing  and  automating  the 
software  development  process;  on  the  other  hand,  prototyping  has  been  promoted  as  a  method  of 
obtaining  information  about  a  problem  before  implementing  it  in  a  production  language. 

Traditionally,  programs  have  been  verified  experimentally  by  choosing  a  number  of  input  test 
cases,  running  the  program,  and  evaluating  the  results.  The  problem  with  this  approach  is  that 
it  can  never  formally  prove  the  correctness  of  the  program.  There  might  always  be  cases  not 
captured  by  the  test  data.  Prototyping  attempts  to  catch  errors  at  an  early  a  stage  of  the  program 
development . 

Formal  verification,  on  the  other  hand,  is  an  analytic  process  that  defines  in  mathematical 
terms  what  it  means  for  a  program  to  be  correct,  given  a  mathematical  definition  of  the  program¬ 
ming  language  used,  and  a  formal  specification  of  the  problem  being  solved.  However,  the  formal 
verification  of  large  programs  has  turned  out  to  be  impractical. 

By  contrast,  the  transformational  approach  (see  [PS83]  for  an  excellent  survey)  is  a  synthetic 
or  constructive  one.  A  program  is  derived  from  a  problem  specification  by  successive  application  of 
correctness-preserving  transformations  that  lead  to  a  correct  implementation  of  the  problem.  A  key 
point  is  the  reusability  of  transformation  rules;  once  a  rule  is  proven  correct,  it  can  be  used  again 
if  applicable  to  the  particular  situation.  Libraries  of  known  transformations  can  be  established;  as 
the  libraries  grow,  the  number  of  new  proofs  required  in  the  development  of  a  program  decreases. 

An  advantage  of  transformational  programming  is  the  tight  coupling  of  program  development 
and  verification.  A  complete  program  consists  of  the  specification,  the  final  executable  program, 
and  the  complete  history  of  intermediate  transformations.  This  history  serves  as  a  program  doc¬ 
umentation,  since  it  captures  all  ideas  and  decisions  relevant  for  the  program.  Later  modification 
is  facilitated  because  it  corresponds  to  stepping  back  in  the  development  and  following  an  other 
branch  at  a  point  where  a  decision  was  made. 

The  transformational  paradigm  leaves  a  lot  of  room  for  automation;  the  programmer  can  be 
relieved  of  tedious  tasks  that  come  up  during  program  development  such  as  rewriting  program 
pieces,  verifying  applicability  conditions  of  transformations,  reviewing  the  development  history  etc. 
However,  important  design  decisions  are  generally  left  to  the  programmer. 

Recently,  a  considerable  effort  has  been  made  towards  the  development  of  a  common  prototyping 
system  (CPS,  see  [Cfab88])  for  the  Ada  language.  As  a  future  goal,  we  would  like  to  explore  how 
a  transformational  component  can  be  incorporated  into  such  a  prototyping  environment.  The 
transformational  system  could  be  used  to  aid  replacing  pieces  of  the  prototype  by  more  efficient 
pieces  obtained  by  transforming  into  a  target  language. 

From  this  perspective,  we  take  a  detailed  look  at  several  approaches  to  transformational  pro¬ 
gramming  in  the  next  three  sections,  emphasizing  relevant  features.  In  the  last  section,  we  present 
a  framework  of  common  criteria  for  the  comparison  of  transformational  systems.  Finally,  we  apply 
these  criteria  to  the  systems  under  consideration. 

2  The  Munich  CIP  Approach 

The  CLP  project  was  created  with  the  goal  of  producing  an  implemented  software  development 
system  using  the  transformational  paradigm.  CIP  stands  for  “Computer-aided,  Intuition-guided 
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Programming”.  The  project  consists  of  two  components:  The  wide-spectrum  programming  lan¬ 
guage  CIP-L  [B+85a],  and  the  program-transformation  system  CIP-S  [B+87a],  The  user  of  CIP-S 
specifies  a  problem  in  CIP-L  and  derives  a  (hopefully  efficient)  implementation  of  the  problem. 

A  wide-spectrum  language  is  a  single  formal  language  which  includes  specification  constructs, 
implementation  constructs,  and  any  intermediate  constructs  needed  for  program  development.  The 
use  of  such  a  language  makes  it  easy  to  formulate  program  transformations  as  local,  correctness¬ 
preserving,  source-to-source  transformations  on  a  common  semantic  basis.  CIP-L  is  a  scheme 
language ,  i.  e.,  a  language  without  a  fixed  set  of  basic  data  types.  This  makes  it  possible  to 
manipulate  program  schemes  which  can  later  be  instantiated  as  needed. 

The  system  CIP-S  is  an  interactive  system  whose  purpose  is  to  assist  the  programmer  in  clerical 
work  and  other  tasks  that  can  be  automated.  It  performs  routine  task  such  as  keeping  track  of 
various  versions  of  a  program,  and  it  provides  a  mechanism  for  performing  transformations.  In 
addition,  it  helps  the  user  verify  the  applicability  of  a  transformation  to  a  portion  of  the  program. 
It  also  maintains  a  library  of  transformation  rules  and  assists  in  searching  the  library.  Decisions, 
however,  are  always  made  by  the  user;  he  or  she  supplies  the  intuition. 

It  is  interesting  to  note  that  the  CIP  system  was  developed  using  exactly  the  CIP  methodology. 
First,  a  formal  specification  of  CIP-S  was  written,  which  includes  a  logical  calculus  for  program 
transformations.  Then,  a  prototype  of  the  system  was  derived  from  the  specification  using  program 
transformations.  The  system  has  not  yet  reached  its  final  state;  therefore  parts  of  its  description 
below  refer  to  the  specification  rather  than  the  implemented  system. 

For  a  detailled  evaluation  of  the  CIP  project  see  [WJS87]. 

2.1  The  System  CIP-S 

The  purpose  of  CIP-S  is  the  transformational  development  of  program  schemes.  This  includes 
manipulation  of  actual  programs,  derivation  of  new  transformation  rules  within  the  system,  trans¬ 
formation  of  algebraic  types  and  type  schemes,  and  verification  of  applicability  conditions. 

Transformation  rules  in  CIP  consist  of  three  components:  input  template,  output  template,  and 
applicability  condition.  The  input  and  output  templates  are  program  schemes,  possibly  including 
context  parameters,  which  allow  the  user  to  mark  fragmentary  terms  ( contexts )  as  replaceable.  A 
typical  rule,  for  example,  expresses  the  distributivity  of  the  conditional  expression  over  varying 
contexts: 

F[if  C  then  A  else  B  fi] 

I 

if  C  then  F\A]  else  F[B]  fi 
Under  this  rule,  the  code  fragment 
x  :=  if  x  >  3  then  4  else  2  fi 

is  transformed  to 

if  x  >  3  then  x  :=  4  else  x  :=  2  fi. 

In  addition  to  such  rules,  CIP  provides  a  metalanguage  for  transformation  expressions,  which  allows 
transformational  algorithms  to  be  expressed  similarly  to  regular  expressions. 

The  transformation  rules  are  organized  as  generative  sets,  i.  e.,  a  small  set  of  powerful  elementary 
transformations  that  can  be  used  to  construct  new  rules.  The  basic  rules  include  general  principles 
(such  as  unfold,  fold),  the  definitional  rules  of  the  language,  and  the  axioms  and  inference  rules 
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of  the  predefined  types.  Furthermore,  some  frequently  used  derived  rules,  as  well  as  standard 
implementation  techniques  for  some  abstract  types,  are  available.  CIP-S  allows  the  user  to  define 
new  transformation  rules  and  maintain  libraries  of  rules. 

CIP-S  records  the  whole  development  of  a  program  from  a  specification.  It  allows  the  user  to 
browse  through  the  development,  backtrack  and  continue  with  another  program  version,  return  to 
previous  development  steps  etc.  The  system  assists  the  user  in  the  selection  and  application  of 
transformation  rules.  All  this  calls  for  a  sophisticated  interactive  user  environment  on  top  of  the 
system  core.  This  environment  makes  use  of  facilities  such  as  graphic  screens,  windows,  and  mice, 
and  allows  the  user  to  perform  all  the  system  functions  conveniently. 

There  are  several  languages  involved  in  the  design  of  CIP-S.  As  the  object  language ,  the  lan¬ 
guage  for  formulating  program  schemes,  an  algebraically  defined  language  is  assumed.  CIP-L  was 
designed  especially  to  serve  this  purpose.  Transformation  algorithms  are  formulated  in  a  special 
metalanguage,  which  consists  of  constructs  similar  to  regular  expressions,  over  transformation  rules 
as  described  above.  This  language  is  supposed  to  be  later  replaced  by  a  simple  applicative  lan¬ 
guage.  Finally,  the  language  for  enabling  conditions  of  transformation  rules  consists  of  predicates 
over  terms  of  the  object  language. 

Note  that  CIP-S  is  extensible  with  respect  to  the  transformation  libraries,  object  language  and 
user  environment.  Since  CIP-S  is  being  used  for  its  own  development,  it  is  even  possible  to  change 
the  implementation  language  by  going  back  to  the  stage  before  fixing  the  language,  and  changing 
to  a  different  language. 

2.2  The  Language  CIP-L 

In  order  to  accommodate  a  full  variety  of  programming  styles,  the  CIP  group  conceived  the  notion 
of  a  wide-spectrum  language,  a  single  language  providing  all  the  styles  required,  from  specification 
level  down  to  implementation  level.  CIP-L  has  the  following  components: 

1.  Algebraic  or  abstract  data  types.  Types  provide  the  means  for  specifying  basic  objects  and 
operations  on  those  objects.  They  determine  the  application  for  the  language. 

2.  The  scheme  language.  This  language  is  the  heart  of  CIP-L  and  provides  the  means  for  writing 
specifications  and  algorithms.  It  is  structured  hierarchically  with  levels  from  specification 
through  application,  procedure  down  to  control.  The  specification  level  is  called  kernel  and 
consists  of  expressions  over  the  basic  types. 

3.  Programs.  Programs  connect  the  first  two  parts  of  the  language.  A  program  is  a  finite  set  of 
components,  which  may  be  types,  computation  structures,  modules  or  devices.  Computation 
structures,  modules  and  devices  are  implementations  of  the  basic  data  types  underlying  the 
scheme  language. 

The  language  is  defined  formally.  Algebraic  data  types  are  defined  through  algebraic  seman¬ 
tics.  Only  the  kernel  of  the  scheme  language  is  defined  through  traditional  denotational  seman¬ 
tics  [Sto77,Gor79].  The  additional  levels  are  defined  in  terms  of  the  previous  levels  by  transforma¬ 
tional  semantics. 

CIP-L  has  an  abstract  syntax,  hence  its  external  representation  is  flexible.  The  language  can  be 
represented  by  an  ALGOL-like  or  Pascal-like  variant,  or  even  by  a  LISP-like  or  Prolog-like  variant, 
depending  on  what  the  particular  application  calls  for.  All  that  is  required  is  an  appropriate 
parser /unparser. 
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Other  notable  characteristics  of  CIP-L  ar efull  typing ,  modularity ,  provided  by  algebraic  types  on 
the  specification  level  and  by  computation  structures,  modules  and  devices  (modules  with  internal 
state)  on  the  implementation  level,  and  nondeterminism. 

It  is  not  surprising  that  some  modern  prototyping  languages  are  designed  similarly;  for  example, 
the  prototyping  language  Griffin  [D+90],  currently  under  development  at  New  York  University, 
shares  several  key  features  with  CIP-L:  strong  typing,  user-defined  algebraic  data  types,  and  a 
layered  semantics. 

We  will  now  examine  the  three  components  of  CIP-L  in  detail. 

2.3  Abstract  Data  Types 

In  programs  certain  identifiers  are  used  for  object  sets  (sorts),  for  elements  of  these  sets  (constants), 
and  for  functions  operating  on  them  (operations).  Before  we  give  an  interpretation  to  these  symbols, 
we  are  dealing  with  program  schemes.  The  purpose  of  abstract  types  is  to  give  an  interpretation  by 
presenting  the  sorts,  constants  and  operations  with  their  functionality  and  by  stating  the  properties 
in  form  of  axioms  (laws).  Abstract  types  may  be  constructed  hierarchically. 

An  abstract  type,  briefly  called  type,  consists  of  two  main  parts: 

•  The  signature ,  which  is  a  list  of  symbols  whose  meaning  is  specified  within  the  type.  Each 
symbol  is  associated  with  a  specification  of  its  kind  (sort  or  carrier  set  of  an  abstract  type): 
Symbols  for  sorts  are  given  as  attributes  the  keyword  sort,  symbols  for  constant  elements 
their  sort,  and  operation  symbols  are  given  their  functionality.  A  subset  of  these  symbols  is 
made  visible  to  the  outside  by  the  list  of  constituents  of  the  type.  Symbols  not  made  visible 
are  called  hidden  symbols.  The  signature  determines  a  language  of  well-formed  terms  (formed 
from  the  constants  by  applying  the  operations)  which  may  also  contain  free  identifiers. 

•  The  collection  of  laws,  which  specify  the  properties  of  the  symbols.  The  laws  are  first-order 
logic  formulas  built  from  equations  and  inequalities  and  logical  operators  A,  V,  =>,  O,  and  of 
the  quantifiers  V,  3,  over  sorts  of  the  type. 

The  meaning  of  a  type  T  is  defined  to  be  the  class  of  all  term-generated  models  of  T.  Note  that 
there  might  be  many  different  models  for  a  type  T,  or  there  might  be  only  one,  depending  on  the 
particular  set  of  laws  supplied. 

For  example,  the  following  type  specification  describes  integer  arithmetic  modulo  3: 

type  M0D3  = 

mod3,  zero ,  one,  two,  succ  : 
sort  mod3, 
mod3  zero,  one,  two, 
funct  (mod3)  mod3  succ 
laws  mod3  x,  y  : 
zero  one, 
zero  two, 
one  7^  two, 
succ(zero)  =  one, 
succ(one)  =  two, 
succ(two)  =  zero 
end  of  type 
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It  lias  the  visible  constituents  mod3,  zero ,  one,  two  and  succ ,  where  ^ero,  one  and  two  are  constants 
of  sort  mod3,  and  succ  is  a  function  taking  one  mod3  parameter  and  returning  a  mod3. 

A  type  T  may  depend  on  other  types  in  three  ways: 

type  inclusion:  A  type  T  can  be  made  available  by  an  instantiation,  possibly  with  renaming,  using 
an  include  clause.  Such  an  instantiation  is  equivalent  to  the  textual  substitution  of  the  body 
of  T  (with  consistent  renaming);  there  is  no  protection  on  the  models  of  T.  Instantiations  are 
simply  a  shorthand  notation  for  type  bodies;  they  have  no  independent  semantics. 

base  types:  A  type  may  use  another,  “primitive”  type  by  means  of  a  based  on  clause.  In  this 
case,  the  primitive  type  is  protected  against  modifications  of  the  carrier  sets  of  its  models. 
This  is  called  hierarchy  preservation  and  guarantees  independent  implement  ability  of  the 
primitive  type.  The  types  mentioned  in  the  based  on  clause  should  be  thought  of  as  part  of 
the  new  type. 

type  parameters:  Type  specifications  may  be  parametrized.  The  parameters  can  be  sort  symbols, 
constant  symbols,  and  operation  symbols.  They  are  attributed  in  the  same  way  as  within  a 
signature.  Such  type  specifications  are  called  type  schemes  or  generic  types.  They  may  be 
instantiated  both  in  based  on  clauses  and  include  clauses. 

Modes  are  a  shorthand  notation  for  certain  types  and  type  schemes  which  are  frequently  used. 
Modes  are  syntactically  introduced  by  (possibly  recursive)  mode  declarations.  The  semantics  of 
mode  declarations  is  explained  by  instantiation  of  the  associated  types;  thus  recursive  modes  are 
explained  in  a  straightforward  manner.  There  are  two  basic  kinds  of  mode  declarations: 

sum:  A  sum  is  a  disjoint  union  of  a  finite  number  of  variants  and  comes  with  constructors,  projec¬ 
tions,  and  boolean  test  functions,  which  indicate  whether  an  intended  projection  would  yield 
a  defined  result. 

product:  A  product  is  a  finite,  heterogeneous  tuple.  A  product  has  a  constructor  function  and 
component  selector  functions. 

For  practical  purposes,  the  programming  system  for  the  language  should  provide  a  collection 
of  predefined  types  as  a  basis  for  further  programming  activities.  Typical  standard  types  include 
boolean  values,  integers,  finite  sets  and  multisets,  finite  mappings,  sequences,  trees,  pointers  and 
pointer  structures. 

2.4  The  Scheme  Language  of  CIP-L 

We  will  present  the  different  style  levels  of  the  scheme  language  and  give  examples.  For  most 
examples,  we  will  use  an  ALGOL-like  representation. 

2.4.1  The  Expression  Language  for  Logic  and  Functional  Programming 

An  expression  denotes  objects  of  a  certain  kind  (e.  g.,  a  sort  of  an  algebraic  type).  Fundamental 
expressions  are  the  terms  over  basic  object  and  operation  symbols  of  an  underlying  algebraic  type; 
these  symbols  have  to  be  interpreted  in  some  model  of  that  type.  Other  examples  of  expressions 
follow: 

Guarded  Expressions.  The  guarded  expression  has  the  following  form: 
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if  Bi  then  Af  []  . .  ,  \  Bn  then  En  fi 

The  guards  B,  are  boolean  expressions,  and  the  £,  are  expressions  of  the  same  non- functional 
kind  m.  The  guarded  expression  is  nondeterministic,  its  value  is  one  of  the  £,-  for  which  the 
corresponding  Bi  is  true.  The  set  of  possible  values  is  called  breadth.  If  none  of  the  Bi  is  true, 
the  value  of  the  guarded  expressions  is  not  defined.  If  the  breadth  contains  only  one  value,  the 
expression  is  called  determinate ,  otherwise  it  is  called  nondeterminate. 

If  B  is  determinate,  the  guarded  expression 

if  B  then  £)  []  -i  B  then  £2  fi 

is  semantically  identical  to  the  conditional  expression 

if  B  then  £f  else  £2  fi 

A  guarded  expression  with  constantly  true  guards  describes  an  arbitrary  choice  between  the 
branches.  Its  breadth  is  the  union  of  the  breadth  of  £1  and  the  breadth  of  £2.  Such  a  guarded 
expression  may  be  abbreviated  by 

(#t  I  e2% 

an  expression  called  finite  choice.  The  finite  choice  has  the  following  property  with  respect  to 
the  application  of  a  function  /:  The  expressions  /((£i  []  £2))  and  ( / ( £1 )  []  f(E2 ))  have  the  same 
breadth.  This  holds  since  we  have  call-by- value  and  call-time  choice  semantics.  To  avoid  semantic 
problems,  we  do  not  allow  the  nondeterministic  choice  between  higher-order  functions. 

Function  Abstraction  and  Application.  An  abstraction  is  a  parameterized  expression;  it  is  of 
the  form 

(mi  xi, . . .,  m„  xn)  r  :  £ 

where  the  x,  are  parameters  of  kinds  m ;  and  may  occur  free  in  the  expression  £.  If  /  is  such  an 
abstraction,  then  the  application  of  /  to  expressions  £,•  of  appropriate  kinds  is  expressed  by 

/(£i,...,£re). 

The  functionality  (kind)  of  f  is  given  by 
funct  (mi, .  .  . ,  m n )  r. 

We  may  restrict  the  domain  of  arguments  by  putting  an  appropriate  assertion  behind  the  formal 
parameter  list,  as  in 

(nat  a,  nat  b  :  a  >  b)  nat  :  a  —  b. 

If  the  assertion  is  not  satisfied  for  the  actual  parameters,  the  result  of  the  application  is  unde¬ 
fined. 

Note  that  the  language  is  fully  functional;  functions  may  occur  as  parameters  and  as  results  of 
functions.  There  are  two  standard  higher-order  operations:  function  composition  o  and  function 
tupling  (as  in  FP). 

Fixpoints  and  Recursion.  A  function  may  be  defined  as  a  hxpoint  of  a  functional  equation. 
We  have  the  hxpoint  operator  Y  which  is  applied  to  a  pair  consisting  of  a  function  identifier  and 
an  abstraction  with  a  free  occurrence  of  that  identifier.  The  application  of  the  hxpoint  operator 
gives  the  minimal  solution  of  the  corresponding  functional  equation.  Consider,  for  example,  the 
following  definition  of  the  factorial: 
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(Y  /  :  (nat  n)  nat  : 

if  n  =  0  then  1 
else  n  *  /(n  —  1) 

fi 

) 

Here,  /  is  not  known  outside  of  the  fixpoint  expression  and  cannot  be  used  as  a  function  identifier. 

Note  that  (Y  /  :  A)  reduces  to  A  itself  if  there  is  no  free  occurrence  of  /  in  A.  Furthermore, 
the  fixpoint  operator  generalizes  to  systems  of  functional  equations. 

Descriptive  Constructs.  If  P  is  a  characteristic  predicate,  then  the  object  specification 
that  r  x  :  P( x) 

denotes  the  unique  x  of  kind  r  for  which  P( x)  holds.  This  form  is  called  description. 

If  P  is  not  a  characteristic  predicate,  we  may  still  write 

some  r  x  :  P(x). 

Here  the  breadth  of  this  comprehensive  choice  depends  on  P . 

We  may  form  sets  by  set  comprehension: 

{m  x  :  P(x)} 

denotes  the  set  of  all  objects  of  kind  m  that  satisfy  P(x).  (We  may  also  enumerate  sets;  enumeration 
can  be  viewed  as  a  comprehension). 

Expressions  may  be  quantified  using  the  universal  quantifier  V,  the  existential  quantifier  3  and 
the  unique  existential  quantifier  3j_ .  Quantified  expressions  have  boolean  results  and  are  typically 
used  in  conditional  expressions. 

2.4.2  The  Full  Applicative  Language 

This  level  of  the  language  provides  facilities  for  the  declaration  of  objects  and  functions. 

Object  Declarations.  We  may  give  objects  names  by  means  of  a  collective  object  declaration ;  we 
write 


( m  i  Xj , . . . ,  m  n  xn )  —  ( E\ , . . . ,  En ) 

and  introduce  the  object  identifiers  x, .  For  n  =  1,  the  parentheses  are  omitted;  we  simply  write 
m  x  =  E. 

We  may  restrict  object  declarations  in  a  similar  way  as  function  parameters;  consider  e.  g., 
(nat  a,  nat  b:  a  >  b)  =  (19,7). 

Naturally,  declarations  may  occur  sequentially. 

Function  Declarations.  If  m  is  a  function  kind  in  an  object  declaration,  we  have  a  function 
declaration  in  A-calculus  style  ( E  is  an  abstraction  in  this  case). 

Functions  may  be  declared  in  the  ALGOL  style  by  writing 
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funct  /  =  (m  x)  r  :  E. 

This  is  just  an  abbreviation  for  convenience.  Similarly,  a  unary  fixpoint  operator  may  be  suppressed 
by  replacing 

funct  (m)  r  /  =  (Y  /  :  (m  x)  r  :  E ) 
by  the  traditional  ALGOL-style  way  of  declaring  a  recursive  function: 
funct  /  =  (m  x)  r  :  E, 

2.4.3  The  Procedural  Language 

This  level  of  CIP-L  contains  constructs  such  as  variables,  assignments  and  procedures. 

Variables  and  Assignments.  Similar  to  the  object  declaration,  we  may  declare  a  variable ,  which 
may  be  initialized: 

var  m  x  :=  E. 

We  assign  a  value  to  a  variable  using  the  assignment  statement 
x  :=  E 

with  the  semantics  of  variables  as  reusable  object  identifiers. 

Variables  can  be  declared  and  initialized  collectively,  as  in 

(var  int  i,var  int  j)  :=  (3,4), 

and  their  values  may  be  changed  by  a  collective  assignment,  e.  g., 

(i,j)  ■=  (7,13). 

Statements  are  separated  using  . 

Procedures.  Procedures  are  different  from  functions  in  that  they  do  not  yield  a  result.  In 
CIP-L,  expressions  are  strictly  distinguished  from  statements  to  keep  the  collection  of  program 
transformations  manageable.  Hence  there  are  only  pure  procedures.  Procedures  either  explicitly 
change  formal  variable  parameters  or  implicitly  modify  nonlocal  variables  (suppressed  variable 
parameters).  Every  procedure  call  can  be  reduced  to  an  assignment  to  the  variable  parameters. 
Procedures  are  declared  as  in  the  following  example: 

proc  assigntwo  =  (var  int  i)  :  i  :=  2, 

and  are  called  in  a  call  statement,  as  in 

var  int  k ;  call  assigntwo(k). 

To  avoid  aliasing,  it  is  required  that  no  two  (explicit  or  implicit)  variable  parameters  of  a  procedure 
may  be  associated  with  the  same  variable. 

2.4.4  The  Control-oriented  Language 

This  part  of  CIP-L  consists  of  the  control-flow  constructs  such  as  the  usual  iteration  statements 
(while  loop,  do  loop  with  explicit  leave  statement  in  the  loop  body).  It  also  contains  labels  and 
goto  statements. 
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2.4.5  Parallel  Constructs 

CIP-L  allows  limited  parallelism  in  the  form  of  the  parallel  composition  of  blocks.  Critical  regions 
may  be  constructed  using  the  following  form: 

await  C  then  P  endwait, 

where  C  is  a  condition,  and  P  is  the  critical  region. 

2.5  Programs  in  CIP-L 

Computation  structures,  modules,  and  devices  provide  interpretations  of  the  algebraic  types  that 
underly  the  scheme  language.  They  can  be  grouped  together  as  components  of  a  program.  The 
program  is  executed  by  invoking  executable  visible  constituents  of  its  components. 

Computation  Structures.  A  computation  structure  is  a  collection  of  declarations  for  sorts, 
objects,  and  functions  that  are  made  available  to  the  outside.  Computation  structures  provide  a 
means  for  implementing  types.  The  declaration  of  a  computation  structure  has  the  form 

structure  CS  =  <C  constituents  >■ 

D\,  ■ .  ■ ,  Dr 

end  of  structure 

The  list  of  constituents  corresponds  to  that  of  a  type;  it  consists  of  symbols  for  sorts,  constants, 
and  functions  visible  to  the  outside.  Since  computation  structures  are  intended  as  implementations 
for  types,  procedure  identifiers  must  not  appear  in  the  list  of  constituents.  Computation  structures 
may  be  parameterized  and  thus  used  as  “implementation  schemes”. 

The  body  D\, . . . ,  Dr  of  the  structure  provides  definitions  at  least  for  the  constituents;  further 
“hidden”  entities  may  be  defined  for  internal  use.  The  kinds  of  definitions  that  may  appear  in 
structures  include: 

•  instantiations  of  types,  type  schemes,  (parameterized)  structures,  and  modules 

•  declarations  of  modes,  functions,  objects,  and  procedures  without  global  variables 

The  following  example  illustrates  how  a  type  NEWBOOL  might  be  implemented: 

structure  BO OLIMPL  =  newbool ,  true,  false,  and,  or,  not : 
mode  newbool  =  false  \  true, 
funct  and  =  (newbool  x,  y)  newbool  : 

if  x  —  true  A  y  =  true  then  true  else  false  fi, 
funct  or  =  (newbool  x,y)  newbool  : 

if  x  —  false  A  y  =  false  then  false  else  true  fi, 
funct  not  =  (newbool  x )  newbool  : 
if  x  —  true  then  false  else  true  fi 
end  of  structure 

A  computation  structure  is  called  a  syntactically  correct  implementation  of  a  type  if  the  ele¬ 
ments  of  the  respective  constituent  lists  together  with  their  sorts  and  functionalities  coincide  (after 
consistent  renaming).  A  syntactically  correct  implementation  of  a  type  is  called  semantically  correct 
if  the  implementation  provides  a  model  of  the  type. 
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Modules  and  Devices.  Modules  and  devices  are  similar  to  computation  structures,  except  that 
they  may  also  export  procedures.  Devices  may,  in  addition,  contain  definitions  of  hidden  variables 
which  can  be  manipulated  through  the  procedures  provided.  An  example  for  a  module  would  be 
a  collection  of  procedures  manipulating  a  stack  given  as  a  parameter.  One  could  imagine  a  device 
which  supplies  similar  functions,  but  has  the  stack  as  an  internal  variable  accessible  only  through 
the  procedures. 

3  Algorithmics  —  an  Algebraic  Approach 

The  Algorithmics  group  (IFIP  WG  2.1)  is  motivated  by  the  conviction  that  a  great  deal  of  the 
activities  of  algorithmic  development  should  and  can  be  performed  in  a  similar  way  as  mathemati¬ 
cal  activities.  The  group  is  developing  a  language  for  “algorithmic  expressions”  with  the  idea  that 
algorithms  are  developed  by  manipulating  such  expressions  [Mee83,Mee84,B+85b,B+87b],  The  key 
point  is  that  the  language  should  be  a  uniform  framework  rather  than  a  union  of  a  specification 
language  and  an  implementation  language:  it  must  be  possible  to  view  all  expressions  as  specifi¬ 
cations,  although  not  all  expressions  need  to  suggest  an  implementation.  The  language  should  be 
comparable  to  the  language  used  by  mathematicians,  with  notations  that  give  a  convenient  way  to 
express  concepts  and  facilitate  reasoning.  Clearly,  a  nice  algebraic  structure  is  a  prerequisite  for 
obtaining  interesting  results,  since  otherwise,  no  general  laws  can  be  expressed,  and  each  step  has 
to  be  verified  afresh. 

ft  is  important  to  note  that  the  specific  notational  conventions  used  should  be  given  little 
weight.  The  idea  is  to  find  the  right  balance  between  readability,  terseness,  and  dependability.  In 
our  presentation  of  the  framework,  we  try  to  use  a  “conventional”,  self-explanatory  notation  using 
parentheses  only  to  avoid  ambiguities. 

3.1  Structures 

First  of  all,  we  need  to  define  some  objects  to  work  on.  Suppose  D  is  a  domain,  e.  g.,  numbers  or 
booleans.  We  define  a  new  domain 

S'd  =  D  0  S'd  x  Sd, 

the  domain  of  D-structures,  each  of  which  is  either  an  element  of  D  or  constructed  from  two 
D -structures. 

To  actually  build  D-structures,  we  use  an  injection  operation  '  and  a  construction  operation 
+  .  If  a  is  an  element  of  D,  then  '  a  stands  for  the  corresponding  element  of  Sjj  .  We  shall  write  a 
instead  of  ''a.  If  x  and  y  are  .D -structures,  then  x  -f  y  denotes  the  D-structure  constructed  from  x 
and  y.  Sjj  is  the  set  of  all  structures  that  can  be  built  from  D  by  a  finite  number  of  injections  and 
constructions. 

We  can  introduce  an  identity  element  by  redefining 
S'd  =  D  0  {0}  ®  Sd  x  S'd 
and  imposing  the  identity  law 
a:  +  0  =  0  +  £  =  x. 

Now  we  have  a  reasonable  starting  point,  since  we  can  obtain  familiar  structures  by  imposing 
other  algebraic  laws.  Of  particular  interest  are  the  laws  of  associativity : 
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X  +  (y  +  z)  =  (x  +  y)  +  z, 
of  commutativity. 

x  +  y  =  y  +  x, 
and  of  idempotency: 

X  +  X  =  X. 

ft  is  interesting  to  note  that  with  each  new  law  we  get  another  familiar  data  structure:  we  get, 
successively,  lists,  multi-sets  and  sets.  For  sets,  '  is  the  function  a  {a},  +  is  the  set  union  U, 
and  0  is  the  empty  set. 

3.2  Basic  Operations 

We  will  use  the  following  lemma  to  develop  further  results: 

Lemma  3.1  (Induction  Lemma)  Let  f  and  g  be  two  functions  defined  on  Sfi,  satisfying  the 
following  conditions: 

(i)  f  0  =  g  0 

(ii)  f  a  =  g  a,  and 

(Hi)  f  x  =  g  x  and  f  y  =  g  y  as  induction  hypothesis  implies  f  (x  +  y)  =  g(x  +  y) 

Then  f  =  g. 

Proof.  By  induction  on  the  complexity  of  the  function  argument. 

Note  that  the  first  part  of  the  lemma  can  be  omitted  if  Sjj  does  not  have  an  identity. 

Let  us  introduce  the  following  basic  operations: 

Map.  The  operator  *  applies  a  function  to  each  “member”  (elementary  component)  of  its  argument, 
and  the  result  is  a  structure  of  the  function  values  obtained.  If  /  is  a  function,  then  /  *  stands  for 
the  function  satisfying 

(i)  /  *  0  =  0 

(ii)  f  *  a  =  "  /  a, 

(iii)  f  *  {x  +  y)  =  f*x+f*y. 


Filter.  The  operator  <  takes  a  predicate  p  and  a  structure  x  and  returns  the  structure  of  compo¬ 
nents  of  x  that  satisfy  p.  p  <  stands  for  the  function  satisfying 


(i)  p  <  0  =  0 


(ii)  p  <  a  = 


a  if  pa 
0  otherwise 


(iii)  p  <  (x  +  y)  =  (p  <  x)  +  (p  <  y) 
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Reduce.  If  ®  is  a  binary  operation  in  D,  and  x  is  in  Sjj .  then  0/  £  returns  the  value  obtained  by 
inserting  0  between  adjacent  components  of  a\  0/  is  the  function  satisfying 

(i)  if  0  has  an  identity  element  e  such  that  e0a  =  a0e  =  a,  then  0/0  =  e, 

(ii)  0/  a  =  a,  and 

(hi)  ©/  U  1  .!/)  --  (©/  x)  0  (0/  y). 

Clearly,  0  must  be  associative  for  0/  to  be  unique. 

Now  that  we  have  some  basic  operations,  we  can  formulate  a  few  laws.  First,  *  distributes 
through  +;  for  all  structures  x  and  y  we  have 

f  *  (x  +  y)  =  (f  *  x)  +  (/  *  y). 

Second,  *  distributes  backwards  through  functional  composition: 

(/  °ff)*  =  (f  *)  o  (g  *). 

Furthermore,  if  /  is  injective  with  inverse  /-1,  then 

(/  *)_1  =  (/_1  *)■ 

We  have  some  rules  involving  <.  Filtering  is  commutative: 

p<q<x  =  q<p<x. 
p  <  is  an  idempotent  operation,  i.  e., 
p<p<x  =  p<  x. 

Finally,  the  following  commutativity  relation  holds  between  *  and  <: 
p  <  f  *  x  =  /  *  (p  o  /)  <  x. 

The  proofs  of  these  laws  are  straightforward  applications  of  the  induction  lemma. 

3.3  Homomorphisms 

A  homomorphism  is  essentially  a  linear  operation  with  respect  to  +.  A  function  ft  is  a  homo¬ 
morphism  in  Se >i  Se2  if  there  exists  an  associative  operator  0  with  identity  element  e  such 

that 

(i)  h  0  =  e  and 

(ii)  h{x  -\-y)  =  hx(&hy 

If  h  0  is  not  defined,  then  0  need  not  have  an  identity  element. 

Note  that  this  gives  an  algebraic  formulation  of  the  “Divide  and  Conquer”  paradigm;  part  (ii) 
tells  us  that  to  conquer  a  compound  structure  z,  we  can  divide  it  in  two  parts  x  and  y.  conquer 
these  parts,  and  combine  the  results. 

It  is  worthwhile  to  examine  homomorphisms,  since  they  represent  a  general  class  of  operations, 
of  which  /  *  and  0/  are  special  cases.  By  combining  them  in  the  form  (0/)  o  (/  *),  all  such 
homomorphisms  can  be  expressed.  This  can  be  stated  in  form  of  another  lemma: 
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Lemma  3.2  (Homomorphism  Lemma)  A  function  h  is  a  homomorphism  iff  h  =  (0/)  o  (/*) 
for  some  operator  0  and  function  f . 

Proof.  The  “only  if”  part  can  be  shown  using  the  distributive  laws  for  *  and  /,  and  the  “if”  part 
is  an  application  of  the  induction  lemma. 

We  can  derive  several  useful  identities  as  a  consequence  of  the  Homomorphism  Lemma.  They 
generalize  the  distributive  laws  of  *,  <  and  /. 

Lemma  3.3  (Domain  Switching)  Let  function  f ,  operations 0  and (x)  satisfy  f  (a  0  6)  =  (/  a)  (x)  (/  6) 
and  /  0/ 0  =  ®/0. 

Thenf  o  (0/)  =  (®/)  o (/  *). 

Proof.  Let  g  =  /  o  (0/)  and  apply  the  homomorphism  lemma. 

Lemma  3.4  (Promotion)  For  arbitrary  function  f ,  predicate  p  and  associative  operator  0  we 
have: 

(^-promotion)  (/  *)  o  (  +  /)  =  (+/)  o  ((/  *)*) 

(<-promotion)  (p  <)  o  (  +/)  =  (  +  /)  o  (( p  <)*) 

(/-promotion)  (0/)  o  (  +  /)  =  (0/)  o  ((0/)*). 

Proof.  We  first  prove  ^-promotion  and  /-promotion  by  the  homomorphism  lemma.  We  then  use 
these  results  to  show  o-promotion. 

Note  that  each  of  these  laws  corresponds  to  a  whole  set  of  program  transformations.  The 
promotion  laws  say  that  rather  than  mapping,  reducing  or  filtering  over  one  large  structure,  one 
can  divide  the  structure  into  smaller  ones,  map  reduce  or  filter  each  of  these,  and  combine  the 
results. 

3.4  Selection 

An  important  class  of  operations  are  selection  operations,  which  select  one  out  of  two  values. 
Minimum  and  maximum  are  such  operations: 

Minimum,  Maximum,  a  J,  6  selects  the  smaller  of  a  and  6,  and  a  j  6  selects  the  larger. 

If  we  now  write  {/  x  for  some  structure  x,  we  get  the  smallest  component  of  x.  A  problem  can 
arise  if  x  =  0.  0  does  not  contain  any  components,  hence  |  /0  is  undefined.  This  can  be  fixed  by 
introducing  a  fictitious  value  oc.  Such  a  domain  extension  can  drastically  simplify  an  algorithmic 
expression,  since  it  reduces  the  number  of  special  cases  to  be  considered.  However,  it  may  introduce 
inconsistencies  with  additional  laws  or  with  laws  involving  other  operations  on  the  domain.  To  give 
an  example  of  the  possible  inconsistencies,  consider  the  operation  <C  defined  by 

a  <C  6  =  a. 

This  selection  operation  is  associative,  since  (a  <C  6)  >C  c  =  a  <C  (6  <C  c)  =  a.  The  function  <C/ 
selects  the  first  element  of  a  list  (or  the  leftmost  element  of  a  tree).  Now  consider  <C/0,  where 
0  is  the  empty  list.  Then  (<C/0)  <C  a  =  a,  since  <C/0  is  the  identity  element  of  <C-  But  from 
the  definition  of  «C,  we  know  that  (<C/0)  <C  a  =  <C/0.  Hence  a  =  <C/0  for  arbitrary  a.  The 
problem  arises  since  the  law  a  <C  6  =  a  already  involves  the  identity  element  of  <C,  in  fact,  each 
element  is  a  right  identity  element  of  To  resolve  this  inconsistency,  we  can  either  restrict  the 
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law  a<5=«tofi  /  </0,  or  use  </0  as  a  right  identity  only.  Which  solution  is  better  depends 
on  the  particular  application. 

Many  programming  problems  can  be  formulated  as  optimization  problems:  find  the  smallest, 
largest  or  cheapest  in  some  given  class  of  values.  Such  problems  can  be  specified  using  the  following 
operation: 


Optimize.  If  /  is  a  function,  a  [j  b  selects  either  a  or  b  according  to  which  is  smaller,  /  a  or  /  b. 
The  definition  of  j/  is  analogous,  it  selects  a  or  b  depending  on  which  is  greater,  /  a  or  /  b.  We 
have 


0  ifb 


a  if  /  a  <  f  b 
b  if  /  a  >  /  b. 


What  happens  in  the  case  /  a  =  /  b  ?  If  /  is  an  injective  function,  then  a  =  6,  and  we  return 
that  value.  However,  in  a  lot  of  cases  the  function  /  is  not  injective.  Then  it  makes  no  sense  to 
say  “let  a  b  be  the  element  in  the  set  {a,  b}  minimizing  /”,  instead  we  should  say  “let  a  If  b 
be  some  element  in  the  set  {a,  6}  minimizing  /”.  Thus  [j  contains  some  nondeterminism.  In 
developing  solutions  to  optimization  problems,  is  is  desirable  to  allow  nondeterminism,  since  we 
are  not  normally  interested  in  any  other  property  of  the  result  than  that  it  minimizes  /. 


3.5  Nondeterminism 

In  the  following,  we  will  examine  nondeterminism  in  more  detail.  As  a  matter  of  fact,  the  members 
of  the  Algorithmics  group  have  different  views  as  to  what  exact  approach  to  nondeterminism  should 
be  taken.  We  discuss  three  possible  approaches  [Mee83,B+85b], 

One  possibility  is  avoiding  explicit  nondeterminism  and  allowing  under-specification  only  through 
set  construction.  In  this  approach,  one  formulates  the  set  of  all  solutions  to  a  problem,  and  then 
selects  a  particular  member  of  this  set  by  some  further  step.  The  main  advantage  of  this  method 
is  that  the  semantics  of  the  expression  language  is  simpler.  On  the  other  hand,  the  objects  and 
functions  in  consideration  become  more  complicated;  instead  of  equality  predicates  it  becomes 
necessary  to  go  to  set  membership. 

As  a  second  possibility,  we  can  allow  a  choice  operator  []  into  the  notation,  but  let  it  always 
denote  some  definite,  but  unspecified,  operator.  The  only  property  of  |  one  may  assume  is  that  | 
is  selective.  More  precisely, 

a  []  b  =  sel{a,  b) 

for  some  selection  function  sel  which  is  to  be  specified  later.  This  model  is  semantically  simple,  but 
note  that  all  different  occurrences  of  []  must  be  bound  to  the  same  sel.  Hence  certain,  laws  which 
seem  obvious  may  contradict  each  other.  For  instance,  the  laws 

a  |  b  —  b  |  a 

cannot  be  valid  together  with 
/(«  1!  b )  =  (/  a)  fl  (/  b). 

To  see  this,  chose  /,  a  and  b  such  that  a/  b,f  a  =  b  and  /  b  =  a. 

The  third  approach  is  to  have  an  operator  |  denoting  arbitrary  choice.  The  following  laws 
involving  []  are  desirable,  but  mutually  incompatible: 
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(i)  A  well-defined  notion  of  a  reflexive,  transitive  refinement  operation  (=>■)  such  that  all  con¬ 
structs  are  monotonic  with  respect  to  =>• . 

(ii)  The  property  that  x  =$■  y  \S  x  \  y  =  x. 

(iii)  The  laws  that  Q  is  commutative,  associative  and  idempotent;  note  that  this,  together  with 
(ii),  implies  reflexivity  and  transitivity  of  ,  whereas  (ii)  together  with  reflexivity  of  would 
imply  idempotence  of  []. 

(iv)  The  law  x  []  y  =>■  x,  which  would  follow  from  (ii)  and  (iii). 

(v)  The  requirement  of  referential  transparency,  which  is  closely  related  to  the  question  whether 
x  Q  y  —  x  |]  y  =  0.  Note  that  the  second  approach  above  keeps  this  property. 

(vi)  The  law  /( x  |  y)  —  (/  x)  Q  (/  y),  or  the  weaker  /( x  Q  y)  =>  (/  x)  Q  (/  y).  Note  that  the  latter 
would  follow  from  (i)  and  (ii). 

(vii)  The  law  (x  =>■  y)  A  (x  =>■  z )  implies  x  =>  (y  |  z),  which  would  follow  from  (ii)  and  (iii). 

(viii)  The  law  V  x  :  f  x  g  x  implies  /  =>  g.  The  other  direction  would  follow  from  monotonicity 
of  =?>. 

We  can  show  that  these  laws  are  mutually  incompatible,  even  if  the  difficult  law  (v)  is  dropped. 
Note  that  fl/ai  stands  for  an  “arbitrary”  choice  from  the  structure  x.  In  particular,  |/0  describes 
choosing  from  an  empty  structure.  What  does  |/0  mean?  It  means  that  no  choice  is  possible,  i. 
e.,  it  denotes  the  unsatishable  specification.  In  particular,  |/0  satisfies  a||/0  =  a,  meaning  that 
having  the  choice  between  “nothing”  and  “something”,  we  must  choose  “something”. 

3.6  Semantics 

All  expressions  encountered  so  far  are  algorithms  in  the  sense  that  we  could  build  a  machine  to 
actually  execute  them.  In  many  cases,  however,  we  are  interested  in  being  able  to  specify  our 
problem  rather  than  giving  a  method  of  solving  the  problem,  especially  if  we  do  not  yet  know  such 
a  method.  We  will  allow  such  “unexecutable”  expressions  so  that  we  are  able  to  have  the  complete 
derivation,  from  the  initial  (formal)  specification  to  the  final  algorithm  in  one  unified  framework. 

In  the  following,  we  give  a  possible  approach  to  the  semantics  of  algorithmic  expressions  [Mee83] 
[B+85b],  Let  £  stand  for  the  set  of  algorithmic  expressions.  We  assume  that  £  is  recursive,  and  that 
£  contains  a  recursive  subset  V  of  expressions  that  are  identified  with  values  (e.  g.,  “7”  or  “A  x  : 
x  +  2”).  Intuitively,  we  can  interpret  an  expression  e  in  £  as  “specifying”  one,  or  more,  or  possibly 
no,  elements  of  V.  We  define  the  breadth  function  /3(e)  to  be  the  set  {v  £  V  \  e  “specifies”  u}. 
On  the  other  hand,  we  can  interpret  e  as  a  “task”  to  find  some  element  of  V.  That  task  might 
have  several  solutions  or  be  impossible.  Define  e  e'  to  mean:  the  task  e  can  be  solved  by  solving 
the  task  e' .  The  refinement  relation  =>  is  a  subset  of  £  X  £.  We  can  think  of  =$■  as  “may  be 
transformed  to”.  The  refinement  relation  is  reflexive  and  transitive.  Interpreting  an  expression  e 
as  a  specification  of  values  in  V,  we  would  expect  e  to  specify  a  given  v  G  V  whenever  e  =$■  v. 
Conversely,  if  v  £  /3(e),  then  v  is  a  solution  of  the  task  e,  so  we  have  e  v.  We  conclude  that 
/3(e)  ={t)£  V  |  e  =>■  u}. 

We  would  like  to  characterize  in  terms  of  B.  A  requirement  for  e  e'  is  certainly  /3(e)  C 
B(e').  But  then,  for  any  e,  e  |/0,  which  is  unreasonable  unless  e  =  Q/0.  This  gives  rise  to  the 
second  requirement  that  |]/0  is  a  replacement  for  e  only  if  e  =  |/0.  However,  this  requirement  of 
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“preservation  of  defmedness”  complicates  the  transformation  rules.  An  alternative  approach  is  to 
accept  the  validity  of  e  =£•  |/0,  keeping  in  mind  that  =>  does  not  exactly  mean  “may  be  replaced  by5’. 
In  this  approach,  we  would  check  for  preservation  of  definedness  individually  for  transformations 
involving  =^. 

Note  that  the  meaning  of  =>  and  the  derivation  of  further  refinement  rules  highly  depend  on 
the  chosen  approach  to  nondeterminism. 

3.7  Application  to  Lists 

Since  the  (finite)  list  is  a  data  structure  with  many  important  applications,  we  would  like  to  examine 
some  specialized  list  operations  [Bir86].  For  lists,  '  is  the  function  a  [a],  ®  is  the  concatenation 
o,  and  0  is  the  empty  list  [  ]. 

Length.  The  length  of  a  list  is  the  number  of  elements  it  contains.  We  denote  this  operation  by  jj. 

Directed  Reduction.  We  now  introduce  two  new  reduction  operators  which  are  closely  related 
to  /.  (left-reduce)  and  (right-reduce)  each  take  three  arguments,  and  operator  ®,  an  initial 
value  e  and  a  list  x.  They  can  be  described  by 

(®^e)[al7  02,  ■  •  ■ ,  an]  =  Gi  ®  (a2  ®  ( •  •  •  ( an  ®  e))) 


(':  /-e )[«i ,  a2, . . . ,  fij  =  ((e  ®  Gi)  ®  g2)  •  •  -®  an. 


Why  do  we  need  two  more  reduction  operators?  There  are  two  main  answers  to  this  question. 
First,  the  directed  reductions  can  be  seen  as  implementations  of  the  operator  /  in  which  the  order 
of  evaluation  is  sequential.  Certainly,  if  ®  is  associative  and  has  identity  element  e,  then 

®/  =  (®^e)  =  (®/>e). 

The  second  answer  is  that  many  more  functions  can  be  described  by  directed  reductions  than 
by  /.  Furthermore,  although  every  homomorphism  can  be  expressed  as  a  reduction,  many  functions 
which  are  not  homomorphisms  can  be  described  as  directed  reductions. 

Observe  that  we  can  characterize  both  forms  of  directed  reduction  recursively,  we  have 


(6  7^)1.] 

(®^e)([a]  o  x ) 

and 


(  ® /-(;)(  X  O  [g]) 


e 

a  ®  (®^Ae)x 


e 

(®7®e)x  ®  a 


We  see  that  processes  lists  from  right  to  left,  and  7 4  from  left  to  right.  In  some  sense,  right- 
reduction  corresponds  to  recursion ,  and  left-reduction  corresponds  to  iteration.  To  make  this  more 
clear,  we  give  an  alternative  recursive  description  of  left-reduction: 


(®^e)[]  =  e 

(®/>e)([a]  ox)  =  (®/>(e  ®  a))x. 


The  equivalence  of  the  two  definitions  can  be  easily  verified  by  induction. 
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Note  that  although  f-  and  />  look  similar,  there  may  be  a  big  difference  in  efficiency  between 
them,  depending  on  the  application.  Generally,  when  processing  lists  from  left  to  right,  all  elements 
of  the  list  have  to  be  considered  before  the  result  can  be  returned.  However,  not  all  right-to-left 
computations  must  necessarily  start  processing  at  the  right  end  of  the  list.  Returning  a  result 
without  evaluating  arguments  whose  values  are  not  needed  is  known  as  lazy  evaluation.  Consider 
for  example  the  function  (<C^-e),  which  selects  the  first  element  of  a  list. 

(<^e)[l  .  . .  100]  =  l<(<^e)[ 2  .  . .  100]  =  . . .  =  1, 

so  the  evaluation  terminates  after  one  step,  and  we  do  not  look  at  the  rest  of  the  list.  On  the  other 
hand,  (<C/>0)  always  returns  0,  but,  using  the  second  recursive  definition  of  />, 

(</>0)[l,2,3]  =  ...  =  0. 

but  in  this  evaluation,  the  complete  list  is  traversed  before  the  result  is  returned. 

Formal  Differentiation.  To  conclude  this  section,  we  would  like  to  give  an  idea  of  how  formal 
differentiation  [Pai81,PK82,Pai86]  fits  into  the  Algorithmics  framework  [Mee],  In  many  applications 
we  want  to  evaluate  an  expression  of  the  form 


/(  1  /■(  )K,  02,  ■  •  •  ,  (t,i\ 

or  even 

/*[oi,  Or  0  02, . . . ,  (©74e)[a1,  a2,...,  an]]. 

We  would  like  to  compute  such  a  value  incrementally,  i.  e.,  (for  the  first  expression)  in  the  form  of 
(0/0/  e))[oi,  02, .  ..,o»]. 

We  can  do  so  by  finding  an  operator  0  such  that 
f{x  ©  a)  =  (/  x)  ®  a, 

since  /o(©74e)  satisfies  the  recursive  equations 

(/°(©/>e))[]  =  fe 

(/c(a/-t))(x  o  a)  =  (if °(©/>e))z)  ®  a, 

which  are  also  solved  by  ©/>(/  e).  Although  such  an  operator  (x)  cannot  necessarily  be  found  for 
all  /  and  0,  it  is  always  possible  to  find  one  for  (/,  id),  defined  by 

y  ®  Z  =  ( / (  7T 2 0  z),n2y  ©  z). 

If  (/,  id)  can  be  computed  efficiently,  then  so  can  /  =  7Ti (/,  id) ,  where  7r,-  returns  the  i-th  component 
of  a  tuple. 
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4  The  RAPTS  Transformational  Programming  System 

RAPTS  [Pai86]  is  a  system  for  the  fully  mechanical  transformation  of  problem  specifications  to 
efficient  RAM  code.  RAPTS  uses  stepwise  refinement  by  successive  application  of  correctness¬ 
preserving  transformations  to  automate  program  design,  verification,  and  analysis. 

The  input  language  for  the  RAPTS  compiler  is  SQ  +  ,  an  abstract  functional  specification  lan¬ 
guage  based  on  finite  set  theory,  which  can  be  shown  to  express  any  partial  recursive  function  in 
a  fixed-point  normal  form.  SQ+  is  essentially  a  functional  subset  of  SETL  [SDDS86]  augmented 
with  fixed-point  operations.  The  compiler  produces  sequential  RAM  code  in  the  C  language. 

RAPTS  uses  several  techniques  to  produce  efficient  target  code  which  correspond  to  different 
phases  of  the  compiler.  The  first  phase  of  the  compiler  translates  expressions  from  fixed-point 
normal  form  into  an  iterative  fixed-point  form.  The  next  phase  applies  finite  differencing  to  the 
code.  The  final  phase  performs  data  structure  selection  for  a  RAM.  The  compiler  is  biased  towards 
greedy  strategies. 

In  addition  to  the  compiler,  RAPTS  provides  several  other  functions  such  as  maintaining  trans¬ 
formation  libraries,  parsing/unparsing  etc.  A  prototype  of  the  system  was  implemented  in  SETL. 

The  RAPTS  methodology  aims  to  be  practical,  emphasizes  automation,  and  focuses  on  a  sub¬ 
class  of  determinate,  tractable  problems  that  compute  finite  sets.  This  subclass  captures  a  wide 
variety  of  problems  arising  in  practice. 

We  will  discuss  the  language  SQ+  and  the  techniques  used  in  the  three  phases  of  compilation. 

4.1  The  Specification  Language  SQ+ 

SQ+  [PH87,CP88]  is  a  very-high-level  functional  problem  specification  language.  It  is  a  functional 
subset  of  SETL,  consisting  of  expressions  over  boolean  and  integer  data  types  and  finite-set  expres¬ 
sions,  enhanced  with  fixed-point  expressions.  SQ+  provides  function  abstraction.  The  semantics 
of  SQ+  is  defined  operationally  in  terms  of  a  lower-level  imperative  set-theoretic  machine  language 
(see  tables). 

Most  SQ+  expressions  conform  to  well-known  mathematical  notations,  with  the  exception  of 
maps.  We  regard  a  map  as  a  finite  set  of  ordered  pairs  that  maps  a  domain  set  to  a  range  set. 
Thus,  a  map  can  be  single- valued  or  multi-valued.  Function  retrieval  is  denoted  by  g(x),  while 
multi-valued  map  retrieval  is  denoted  by  g{x}. 

SQ  +  expressions  and  lower-level  constructs  are  described  in  the  tables  below,  and  their  set- 
theoretic  complexity  is  given.  The  complexity  measure  can  be  based  on  efficient  hash-table  im¬ 
plementations  of  sets  and  maps.  We  assume  that  a  single  hash  operation  on  a  data  item  with 
unit-space  storage  takes  unit  time,  and  that  searching  through  a  set  takes  time  proportional  to  the 
cardinality  of  the  set.  In  the  tables,  let  Q  and  T  be  any  stored  sets,  and  let  g  be  a  map. 

The  sublanguage  SQ  (SQ+  without  fixed-point  expressions)  was  shown  to  have  at  least  the 
expressive  power  of  Relational  Algebra.  To  express  transitive  closure,  we  need  to  add  the  fixed- 
point  expressions  LFPc, ${F)  and  GFPc ,b(F)  (for  least  and  greatest  fixed  point,  respectively)  to 
the  expressions  in  the  tables. 

In  addition,  we  allow  specifications  either  given  in  fixed-point  normal  form,  i.  e., 

the  Q  :  S  C  Q  \  Q  —  F(  Q)  minimizing  Q , 

the  Q  :  Q  C  B  \  Q  =  F(  Q)  maximizing  Q, 

or  in  the  more  general  form 

the  Q  :  S  C  Q  \  K(Q)  minimizing  Q, 

the  Q  :  Q  C  B  \  I\(Q )  maximizing  Q. 
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Expression/ Operation 

Definition 

Complexity 

Q-  =  {) 

assign  empty  set 

0(1) 

Q  with  :=  x 

set  element  addition 

0(1) 

Q  less  :=  x 

set  element  deletion 

0(1) 

x  G  Q 

set  membership  test 

0(1) 

3  Q 

arbitrary  choice 

0(1) 

3  x  e  Q 

test  for  empty  set  with 
arbitrary  assignment  to  x 

0(1) 

g{x } 

image  set  of  x  under  g 

0(1) 

ffO) 

y,  if  g{x }  =  M 

0  (undefined),  otherwise 

0(1) 

g{x }  ■=  {} 

make  image  set  empty 

0(1) 

g{x}  with  :=  y 

add  element  to  image  set 

0(1) 

g{x]  less  :=  y 

delete  element  from  image  set 

0(1) 

y  e  g{x } 

image  set  membership  test 

0(1) 

g(x)  :=  0 

remove  x  from  domain  g 

0(1) 

g( x)  :=  f 

make  g(x)  =  z 

0(1) 

domain  g 

elements  with  nonempty  (7-image 

0(1) 

(for  x  G  Q) 

Block(  x) 

end  for 

execute  Block  for  each 
element  x  in  a  copy  of  Q 

0(  jjQ  *  cost  (Block  (2;))) 

Table  1:  Elementary  SQ  operations 


Expression/ Operation 

Definition 

Complexity 

range  g 

set  of  all  images  under  g 

0(|fl) 

set  cardinality 

O(fQ) 

g[Q\ 

image  of  Q  under  g 

0(|Q) 

{x  e  Q  \  I\(x)} 

set  former 

0(| Q  *  cost(  A'(;r))) 

{e(ic)  :  x  G  Q} 

set  former 

0(jjQ  *  cost(e(  2))) 

QnT 

set  intersection 

Off  Q) 

Q-T 

set  difference 

0  (IQ) 

QU  T 

set  union 

0  (#g  +  BT) 

3  x  G  Q  |  K(x) 

existential  quantifier 

0()j Q  *  cost( K(x))) 

V  x  G  Q  |  K(x) 

universal  quantifier 

0()jQ  *  cost( K(x))) 

QxT 

cartesian  product 

0  (#g*#T) 

min  /  Q 

minimum  value  in  Q 

0(fQ) 

Q  :=  T 

copy  set  T  to  set  Q 

o|  g) 

Table  2:  Nonelementary  SQ  operations 
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These  specifications  are  an  extension  of  SQ  +  ;  RAPTS  tries  to  transform  them  to  fixed-point  ex¬ 
pressions. 

Although  SQ+  is  Turing-complete,  it  is  practical  to  focus  mainly  on  tractable  problems  ex¬ 
pressed  by  a  highly  restricted  subset  of  SQ  +  . 

4.2  Fixed-Point  Transformations 

Given  a  specification  of  the  form 

the  Q  :  S  C  Q  \  K(Q )  minimizing  Q, 

RAPTS  tries  to  rewrite  the  expression  I\(Q )  in  an  equational  form 
Q  =  F(Q). 

We  are  now  looking  for  a  fixed  point  of  F.  With  the  specification  in  this  form,  it  is  easier  to  check 
the  conditions  of  the  fixed-point  transformations  mechanically.  To  rewrite  I\(Q )  in  an  equational 
form,  RAPTS  uses  a  uniformly  terminating  rewrite  system,  whose  current  implementation  is  ad 
hoc  but  captures  a  number  of  frequently  occurring  problems.  If  the  conditions  of  the  fixed-point 
transformations  cannot  be  verified,  the  compilation  terminates. 

Whenever  various  conditions  on  h  (monotone,  inflationary  at  S ,  i.  e.,  h(S)  D  S ),  are  satisfied, 
the  specification 

the  P  :  P  C  S  |  h(P)  =  P  minimizing  P 

is  well-defined  and  can  be  implemented  using  the  following  imperative  code, 

P  :=  5' 

(converge) 

P  :=  h(P) 

end; 

which  converges  after  a  finite  number  of  steps.  The  analog  holds  for  the  dual  specification 
the  P  :  P  C  B  \  h(P)  =  P  maximizing  P. 

Such  an  implementation  may  be  inefficient  for  two  reasons.  First,  the  new  approximation  of 
P  is  completely  recomputed  and  copied  at  each  iteration,  although  it  might  differ  only  slightly 
from  its  previous  value.  Second,  depending  on  the  particular  h ,  the  iterative  step  may  be  biased 
towards  a  particular  search  strategy;  this  also  makes  the  analysis  difficult.  We  would  like  to  iterate 
nondeterministically  in  a  way  that  allows  us  to  take  advantage  of  only  slight  changes  of  P . 

Cai  and  Paige  [CP88]  developed  a  theory  for  fixed-point  computations  which  is  very  general  and 
applies  to  partially  ordered  sets  and  semilattices.  We  restrict  ourselves  to  presenting  applications  to 
collections  of  sets,  which  are  lattices.  This  restriction  makes  sense  since  the  set  is  one  of  the  simplest 
and  most  widely  used  data  types,  and  the  basic  set  operations  frequently  satisfy  the  conditions  of 
the  transformations. 

In  their  paper  [CP88],  they  develop  an  algebraic  approach  to  nondeterministic  iteration  in  fixed- 
point  computation.  A  partial  function  A  is  called  a  workset  function  if  A(Q,  P)  is  either  undefined 
or  empty  if  and  only  if  Q  C  P.  A  partially  defined  function  6  is  called  an  increment  function  if 
S(P,  z )  is  either  undefined  or  b(P ,  z )  A  P .  The  two  functions  are  said  to  be  feasible  relative  to  h  if 
an  increment  of  S  within  the  workset  strictly  increases  S  but  not  beyond  S  U  f(S).  Within  this 
framework,  the  following  transformation  can  be  proved  correct: 
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P  LFPc,w(/) 

1 

P  :=  W; 

(while  3  z  E  A(/(P),P)) 

P:=tf(P,*); 

end 

Whether  the  generated  code  is  efficient  depends  on  the  choice  of  A  and  8.  It  is  easy  to  find 
such  functions  for  a  powerset  T  and  a  monotone  function  /  by  choosing 


A  (Q,P) 


{}  if  Q  C  P 
{Q}  otherwise,  and 


8(P,z )  =  P  U  2. 

The  transformation  leads  to  the  usual  iteration 


P  :=  IT; 

(while  /(P)  3  P) 

/':•/(/'); 

end 

If  we  choose  A(Q,P)  =  Q  —  P  and  8(Pkp)  =  P  with  we  get  the  following  nondeterministic 
iteration: 

P  :=  IT; 

(while  ]  2  £  /(  P)  -  P) 

P  with  :  =  z; 

end 

It  is  possible  to  refine  the  basic  transformation  to  compute  fixed  points  for  various  functions  /. 
Under  certain  conditions,  for  example,  f(S)  and  S  U  f(S)  have  the  same  fixed  point,  so  that 

P  :=  LFPc ,w(S  U  f(3),S) 

I 

P  :=  IT; 

(while  3  z  E  A(/(  P),  P)) 

P:=8(P,zy, 

end 

The  transformation  can  also  be  refined  with  respect  to  different  underlying  data  types  such  as 
decomposable  lattices.  The  theory  can  be  generalized  to  deal  with  systems  of  fixed-point  equations. 

The  code  generated  by  our  transformations  might  still  be  inefficient,  since  the  computation  of 
A(/(P),P)  could  be  expensive  and  is  performed  with  each  iteration.  The  same  might  hold  for 
the  assignment  P  :=  8(P ,  z).  We  would  like  to  apply  finite  differencing  to  implement  fixed-point 
computations  more  efficiently. 

4.3  Finite  Differencing 

Finite  differencing  [Pai81,PK82,Pai86]  is  a  technique  used  to  avoid  recomputations  of  expressions 
of  the  form 
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f  {x  1 :  •  •  •  :  Xn  ) 

and  replace  them  with  less  expensive  incremental  computations. 

We  can  avoid  such  recomputations  by  maintaining  the  invariant 

E  —  /  ( ^1 7  •  •  •  ,  xn ) 

at  the  program  point  where  the  /  is  evaluated;  this  allows  us  to  replace  the  computation  of  /  with 
its  stored  value  E.  That  is,  we 

1.  establish  the  invariant  by  evaluating  /  into  E  on  entry  to  the  program  region  where  the  value 
of  /  is  needed; 

2.  update  E  within  that  program  region  whenever  any  of  the  parameters  xi,...,xn  of  /  are 
modified;  this  update  code  is  called  difference  code  for  E  with  respect  to  the  modifications 
dx.\,....  dx,t ; 

3.  replace  within  the  program  region  all  occurrences  of  f(x i, . . . ,  xn)  by  E. 

This  technique  is  called  finite  differencing,  a  generalization  of  strength  reduction.  Of  course, 
finite  differencing  is  worthwhile  only  if  the  cumulative  cost  of  executing  the  difference  code  for 
expression  E  within  the  program  after  finite  differencing  is  applied  is  less  than  the  cost  of  repeatedly 
evaluating  /  in  the  original  program.  The  idea  is  to  recognize  those  expressions  that  can  be 
maintained  inexpensively  as  invariants.  RAPTS  stores  a  finite  collection  of  elementary  differentiable 
expressions  and  their  associated  blocks  of  difference  code  guaranteed  to  be  inexpensive. 

Whenever  an  expression  can  be  recognized  as  being  composed  of  elementary  differentiable  ex¬ 
pressions,  then  finite  differencing  can  be  applied.  In  order  to  expose  hidden  elementary  differentiable 
expressions  and  to  make  the  code  more  regular,  expressions  are  placed  into  a  normal  form  using 
another  rewrite  system.  This  rewrite  system  performs  a  variety  of  minor  symbolic  manipulations 
such  as  turning  set  difference  and  intersection  into  equivalent  set  formers. 

The  difference  code  for  a  modification  dx  generally  consists  of  a  piece  of  code  before  the  modi¬ 
fication,  called  predifference  code,  and  a  piece  after  the  modification,  called  postdifference  code.  We 
write  d~  <  dx  >  and  d+  <  dx  >,  respectively.  The  predifference  and  postdifference  code  blocks 
can  modify  only  the  variable  E  and  local  variables.  Note  that  the  difference  code  is  not  unique. 

It  is  interesting  to  consider  collections  of  equalities  E\  =  fi  =  fi,  in  which  each  expression 

fi  depends  only  on  variables  aq, . . . ,  xn  and  E\ , . . . ,  Ej  \ .  For  the  purpose  of  efficiency,  we  would 
like  to  maintain  and  exploit  all  of  these  equalities  as  invariants  within  a  program  region  B.  The 
differential  of  ,  Ek  with  respect  to  B  is  denoted  by  d{Ei, . . .,  Eff  <  B  >.  It  is  obtained  from 

B  by  recursively  applying  the  following  transformations: 

1.  Replace  each  modification  dx  occurring  in  B  with 

d{Efi, . . .,  E). }  <  d~  Ei  <  dx  >  dx  d+Ei  <  dx  >>, 

where  no  new  occurrences  of  fi  are  introduced  within  the  difference  code  for  the  remaining 
invariants  E?, ... ,  Ek ,  and  all  occurrences  of  fi  can  be  replaced  by  Efi.  We  call  Ei  the  minimal 
invariant  for  the  differential  () { E\ , . . . ,  Eff  <  dx  >. 

2.  Replace  all  occurrences  of  fi  by  the  corresponding  Ej  within  the  rest  of  B 
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This  gives  us  the  general  chain  rules  for  collective  predifference  and  post  difference  code  blocks, 
where  E\  must  be  minimal  invariant: 

d~{E1: . . . ,  Ek}  <  dx  >  =  d{E2, . .  ■ ,  Ek}  <  d~ Ei  <  dx  >> 

3~{E2, . . . ,  Ek}  <  dx  > 

and 

d+{Et, . . . ,  Ek}  <  dx  >  =  d+{E2, . . . ,  Ek}  <  dx  > 

d{E2, . . . ,  Ek}  <  d+Ek  <  dx  >>  . 

Minimal  invariants  can  be  found  by  examining  the  data  dependency  dag  for  /i , . . 

In  order  to  determine  whether  finite  differencing  actually  improves  the  performance  of  the  code, 
the  set-theoretic  cost  of  the  resulting  program  is  determined  before  applying  any  transformations. 
This  syntactic  analysis  involves  three  components:  the  cost  of  establishing  the  invariants  before  the 
code  block,  the  cost  of  maintaining  them  within  the  block,  and  the  remaining  costs.  The  concept 
of  continuity  is  introduced  to  argue  about  the  costs  of  maintaining  invariants.  Strong  continuity 
means  that  the  worst  case  cost  of  reestablishing  an  invariant  after  a  single  parameter  is  modified 
is  bounded  by  a  constant.  Weak  continuity  means  that  the  cumulative  cost  of  maintaining  an 
invariant  relative  to  all  modifications  to  the  parameter  is  bounded  by  the  cost  of  establishing  the 
invariant  plus  the  number  of  modifications  to  the  parameter.  Certain  continuity  properties  hold 
for  the  composition  of  continuous  functions.  Although  the  concept  of  continuity  helps  only  in  the 
case  of  linear  complexity,  it  can  be  generalized  to  arbitrary  polynomial  complexities. 

Some  additional  techniques  can  be  used  to  speed  up  the  code  obtained  after  finite  differencing 
by  a  constant  factor: 

•  A  collection  of  invariants  can  easily  be  established  in  a  naive  way  by  separately  initializing 
each  E, .  RAPTS  uses  a  stream-processing  technique  to  efficiently  establish  several  mutually 
dependent  invariants  in  few  passes.  This  gives  a  const  ant -fact  or  speedup. 

•  Certain  collections  of  invariants  can  be  established  in  fewer  loops  using  two  other  loop¬ 
combining  techniques  called  vertical  and  horizontal  fusion. 

•  Finite  differencing,  streaming  and  fusion  increase  data  independence  and  possibly  introduce 
useless  code.  Useless-code  elimination  is  used  to  achieve  another  constant-factor  speedup  at 
the  end  of  the  finite-differencing  phase. 

4.4  Data-Structure  Selection 

The  code  obtained  after  the  first  and  second  phases  of  compilation  still  uses  sets  and  maps  as  basic 
data  types;  its  performance  is  analyzed  in  terms  of  the  set-theoretic  complexity  measure.  The 
third  phase  of  the  compiler,  whose  implementation  is  not  yet  complete,  implements  these  sets  and 
maps  using  conventional  storage  structures  on  a  uniform-cost  sequential  RAM  [PH87].  The  RAM 
code  generated  is  guaranteed  to  execute  with  the  same  worst-case  asymptotic  space  and  time  RAM 
complexities  as  the  set-theoretic  complexities  as  the  set-machine  code. 

This  last  compilation  phase  consists  of  the  following  two  steps: 

1.  All  non- element  ary  set-theoretic  operations  in  the  program  are  implemented  in  terms  of  the 
elementary  operations  shown  in  the  table.  The  resulting  program  is  said  to  be  in  set-machine- 
code  normal  form.  This  step  is  straightforward  and  does  not  change  the  complexity  of  the 
code. 
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2.  We  try  to  implement  each  elementary  set  operation  in  terms  of  conventional  RAM  operations 
such  that  the  asymptotic  worst-case  time  and  space  complexities  on  the  RAM  are  as  good  as 
the  set-theoretic  complexities. 

These  performance  objectives  cannot  always  be  achieved.  In  that  case,  the  compiler  would 
apply  heuristics  such  as  representing  sets  as  hash  tables  or  search  trees. 

RAPTS  tries  to  avoid  costly  copy  operations  and  hidden  costs  of  garbage  collection  by  imposing 
various  copy  avoidance  and  deallocation  conditions. 

Depending  on  the  kind  of  operations  required  for  each  stored  set  (maps  are  stored  as  sets) 
in  the  program,  RAPTS  chooses  an  appropriate  internal  representation  for  that  set.  Possible 
representations  are: 

•  unbased  set,  a  doubly-linked  list  with  pointers  to  the  first  and  last  list  cell; 

•  local  set,  consisting  of  a  table  of  all  possible  values  of  the  set,  called  base,  pointers  linking  the 
table  elements  currently  in  the  set,  and  pointers  to  the  first  and  last  element; 

•  sparse  set,  a  doubly-linked  list  where  each  cell  points  to  the  corresponding  element  in  the 
base  table  of  a  local  set. 

The  RAPTS  data-structure  selection  phase  can  be  viewed  as  a  highly  constrained  variant  of  the 
SETL  data-structure  selection  component. 

5  An  Comparative  Evaluation 

The  three  approaches  to  transformational  programming  we  have  presented  cannot  be  compared 
easily.  In  this  section,  we  first  present  a  framework  of  common  criteria  by  which  transformation 
systems  can  be  evaluated.  Then  we  compare  the  three  systems  with  respect  to  these  criteria. 

5.1  Criteria  for  Evaluating  Transformation  Systems 

We  group  our  criteria  in  general  criteria,  system  aspects,  and  language  aspects. 

5.1.1  General  Criteria 

objective:  The  primariy  goal  of  most  transformation  systems  is  the  general  support  for  program 
modification.  This  includes  the  optimization  of  control  structures,  the  implementation  of 
data  structures,  the  adaption  of  given  programs  to  particular  styles  of  programming,  and  the 
generation  of  new  transformation  rules.  Further  goals  are  program  synthesis,  the  derivation 
of  a  program  from  a  specification,  program  adaption  to  particular  environments  or  languages, 
program  description,  and  (deduction-oriented)  verification. 

problem  domain:  An  important  criterion  is  the  problem  domain  a  transformation  system  can 
handle.  Some  systems  restrict  their  problem  domain  to  provide  a  higher  degree  of  automation. 
Others  handle  a  wide  range  of  problems  but  require  that  transformations  be  selected  by  the 
user. 

extensibility:  Several  parts  of  a  transformation  systems  may  be  extensible;  the  library  of  trans¬ 
formation  rules  may  be  extended,  or  a  new  specification  or  programming  language  may  be 
incorporated. 
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5.1.2  System  Aspects 

organization  and  types  of  transformations:  Most  systems  provide  a  predefined  collection  of 
transformation  rules,  which  may  later  be  extended.  There  are  two  contrary  methods  for 
maintaining  the  library  of  transformations  of  the  system:  the  catalog  approach  and  the  gen¬ 
erative  set  approach.  A  catalog  is  a  large  collection  of  domain-specific  knowledge.  On  the 
other  hand,  a  generative  set  is  a  small  set  of  powerful  elementary  rules,  from  which  additional 
rules  can  be  derived.  Recently,  there  seems  to  have  been  a  tendency  towards  the  latter. 

form  of  transformation  rules:  In  general,  transformation  rules  consist  of  input  template,  out¬ 
put  template,  and  applicability  condition.  The  input  and  output  templates  may  be  related 
by  equivalence  or  descendance.  Rules  may  be  procedural  (algorithmic)  or  schematic.  Typi¬ 
cally,  procedural  rules  are  used  as  global  rules,  whereas  schematic  rules  are  locally  applicable 
refinement  rules. 

transformation  process  and  mechanization:  Systems  may  be  fully  manual,  semi-automatic, 
or  fully  automatic.  A  manual  system  is  practical  only  in  connection  with  powerful  transforma¬ 
tion  rules.  Semi-automatic  systems  may  require  user  assistance  on  certain,  hard  transforma¬ 
tions.  Fully  automatic  systems  often  restrict  the  problem  domain  and  apply  domain-specific 
heuristics. 

system  support:  Besides  the  system  component  that  deals  with  transformations,  a  system  may 
provide  facilities  for  prettyprinting  (often  in  connection  with  parsing  and  unparting)  and  for 
documentation  of  the  development  process  (ranging  from  low-level  bookkeeping  to  browsing 
the  decision  tree  representing  the  development  histor). 

5.1.3  Language  Aspects 

specification  and  target  languages:  The  choice  of  languages  used  is  closely  related  to  the  prob¬ 
lem  domain  of  a  transformation  system.  A  system  may  provide  separate  specification  and 
target  languages,  or  allow  the  development  of  programs  within  a  single,  wide-spectrum  lan¬ 
guage.  In  the  former  case,  the  specification  language  is  often  a  very-high-level  descriptive 
language,  and  the  target  language  a  conventional  programming  language,  whereas  a  wide- 
spectrum  language  accomodates  all  levels  of  programming  from  (non-executable)  specifica¬ 
tions  down  to  programs.  Systems  may  use  a  third  language  for  formulating  transformation 
rules  or  transformation  algorithms. 

data  types:  Again,  the  range  of  data  types  offered  by  a  system  is  linked  to  its  problem  domain. 
Transformation  systems  may  have  a  fixed  set  of  data  types,  e.  g.,  sets  and  maps,  or  support 
user-defined  algebraic  data  types. 

nondeterminism:  Nondeterminism  is  an  important  aspect  of  a  transformation  system.  It  allows 
the  programmer  to  avoid  over-specifying  a  problem;  this  may  keep  paths  to  an  efficient  im¬ 
plementation  open.  Nondeterminism  may  be  explicit  in  form  of  a  choice  operator,  or  implicit 
in  form  of  arbitrary  selection,  e.  g.,  from  a  set. 

mathematical  soundness:  Depending  on  its  problem  domain,  a  transformation  system  may  de¬ 
pend  on  certain  mathematical  methods.  It  is  desirable  for  these  methods  to  be  supported  by 
a  sound  theoretical  basis. 
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5.2  Objective  and  Problem  Domain 

The  goal  of  the  CIP  system  is  to  deal  with  general  programming  problems  and  to  support  the  user 
as  much  as  possible  in  mechanizable  tasks.  There  is  no  intent  to  mechanize  any  decisions  during 
the  transformation  process.  CIP  is  so  general  that  its  object  language,  CIP-L,  does  not  contain 
any  predefined  data  types.  All  data  types  are  user-defined;  some  type  definitions  might  be  found 
in  a  standard  library. 

The  Algorithmics  approach  tries  to  put  programming  into  a  mathematical  framework  based  on 
function  application.  While  a  mathematician  manipulates  formulas,  a  programmer  manipulates 
programs  in  the  same  way,  either  using  known  transformations  or  proving  new  results.  It  is  a  pure 
pencil-and-paper  approach,  no  automation  takes  place.  This  sets  Algorithmics  apart  from  the  two 
other  approaches,  which  strongly  emphasize  system  support. 

RAPTS  is  more  pragmatically  oriented.  Its  problem  domain  is  restricted  to  fixed-point  expres¬ 
sions  over  powerset  lattices,  which  it  automatically  compiles  to  efficient,  often  linear-time  imple¬ 
mentations.  RAPTS  also  focuses  on  easy  runtime  analysis  of  the  code  it  produces. 

5.3  Extensibility 

CIP-S  is  extensible  in  several  respects.  The  object  language  is  not  fixed;  it  can  be  replaced,  or  it 
can  be  extended  by  introducing  new  constructs  by  transformation  to  given  ones.  The  library  of 
transformation  rules  can  be  extended. 

Since  Algorithmics  is  a  mathematical  framework,  it  can  be  extended  arbitrarily.  New  types, 
operations,  and  transformations  can  be  added. 

As  a  generalization  of  sets  and  maps,  Cai  and  Paige  consider  fixed-point  transformations  for 
abstract  functions  defined  on  lattice-theoretic  data  types.  Finite  differencing  is  defined  abstractly 
for  any  data  type.  The  rewrite  systems  in  RAPTS  can  be  extended  to  cover  additional  problem 
specifications  and  differentiable  expressions.  New  transformations  can  be  added  as  well. 

5.4  Transformation  Libraries  and  Rules 

In  general,  transformation  rules  consist  of  input  template,  output  template,  and  applicability  con¬ 
dition. 

In  CIP,  input  and  output  templates  are  program  schemes  and  may  include  context  parameters. 
The  rules  are  organized  as  a  generative  set,  i.  e.,  a  small  set  of  powerful  rules  which  may  be  used 
in  transformation  expressions.  The  set  of  rules  is  extensible,  problem-specific  and  frequently  used 
rules  may  be  added. 

Transformations  in  Algorithmics  are  rules  for  the  manipulation  of  applicative  expressions.  The 
rules  have  the  flavor  of  mathematical  identities.  Algorithmics  tries  to  keep  rules  as  general  as 
possible  and  avoid  applicability  conditions. 

RAPTS  uses  different  kinds  of  transformations  at  the  various  phases  of  compilation.  Fixed- 
point  expressions  and  set  expressions  are  transformed  to  appropriate  normal  forms  using  rewrite 
systems.  Other  transformations  lead  from  applicative  to  iterative  form,  and  to  differential  form. 
The  rules  are  implemented  in  large  catalogs. 

The  form  of  transformation  rules  has  to  do  with  the  extent  of  mechanization  a  system  provides. 
Since  Algorithmics  is  non-automatic,  its  rules  are  kept  as  general  as  possible,  thus  making  it  easier 
for  the  user  to  manipulate  expressions  and  formulas.  More  specific  rules  with  complicated  appli¬ 
cability  conditions  as  in  CIP-L  or  RAPTS  calf  for  extensive  system  support  in  finding  appropriate 
rules  and  checking  conditions. 
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5.5  Mechanization  and  System  Support 

From  a  software  engineering  point  of  view,  it  is  important  that  a  system  automate  as  many  functions 
as  possible  and  leave  to  the  programmer  only  tasks  which  require  intuition. 

CIP  tries  to  provide  exactly  this.  CIP-S  keeps  track  of  the  complete  development  history  of  a 
program.  It  assists  the  user  in  verifying  applicability  conditions  of  transformations  and  rewrites 
program  pieces  after  applying  transformations. 

Algorithmics,  on  the  other  hand,  provides  no  mechanization.  Using  the  Algorithmics  method¬ 
ology  can  be  compared  to  a  mathematician’s  work  when  simplifying  arithmetic  formulas. 

By  dealing  with  more  specific  problems,  RAPTS  is  able  to  compile  fully  automatically  a  spec¬ 
ification  to  an  efficient  implementation.  In  each  compilation  phase,  program  pieces  are  rewritten 
mechanically. 

There  is  a  tradeoff  between  generality  and  mechanization.  By  restricting  the  problem  domain, 
specific  knowledge  can  be  incorporated  into  the  system  and  used  to  automate  design  decisions. 

5.6  Specification  and  Target  Languages 

CIP-L,  the  standard  object  language  of  CIP,  is  a  wide-spectrum  language  providing  constructs  from 
specification  level  down  to  control-oriented  level.  Since  CIP-L  is  a  scheme  language  whose  data 
types  can  be  adapted  to  the  problem  being  solved,  it  is  fully  general.  One  should  note  that  the 
object  language  in  CIP  is  not  fixed;  any  algebraically  defined  language  may  be  used. 

The  Algorithmics  language  consists  of  applicative  expressions  over  data  objects  and  functions. 
New  objects  and  operations  may  be  defined  in  terms  of  given  ones  using  equalities.  Functions  may 
be  polymorphic. 

RAPTS  distinguishes  between  specification  language  and  target  language.  Problems  are  speci¬ 
fied  in  SQ  +  ,  a  SETL-like  functional  language  with  fixed-point  expressions.  The  RAPTS  compiler 
produces  RAM  code.  Intermediate  steps  use  SQ+  in  which  descriptive  constructs  have  been  re¬ 
placed  by  iterative  ones. 

In  a  system  such  as  RAPTS,  which  fully  mechanically  compiles  a  specification  to  a  program, 
it  is  a  good  idea  to  have  separate  specification  and  target  languages.  In  CIP  and  Algorithmics 
design  decisions  are  made  by  the  user.  Gradually,  parts  of  the  program  are  replaced  by  lower-level 
constructs  as  we  approach  an  implementation,  while  other  parts  remain  unchanged.  A  single  object 
language  allows  different  stages  of  the  development  to  coexist  in  one  program. 

5.7  Data  Types 

CIP  allows  the  user  to  specify  any  algebraic  type  by  stating  its  objects  and  operations  and  the 
properties  that  hold  between  them.  Such  a  specification  is  effective  only  if  the  user,  in  addition, 
implements  a  model  of  the  type,  i.  e.,  explicitly  programs  the  operations  such  that  the  specified 
properties  hold.  Types  may  be  combined  hierarchically. 

A  similar  concept  is  used  by  the  Algorithmics  group.  Types  are  defined  by  specifying  its 
objects,  operations,  and  laws.  Algorithmics  does  not  require  types  to  be  implemented,  it  requires 
the  existence  of  a  term-generated  model  of  a  type.  A  type  hierarchy  can  be  formed  adding  more 
laws  to  a  given  type. 

RAPTS  uses  sets  and  maps  as  data  types.  Many  practical  problems  can  be  formulated  in  terms 
of  sets,  maps,  and  fixed-point  expressions.  This  restriction  makes  it  possible  to  provide  automatic 
implementation  of  the  data  types. 
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5.8  Nondeterminism 

CIP-L  provides  an  arbitrary-finite-choice  operator  which  applies  to  (non-functional)  values.  To 
avoid  semantic  difficulties,  finite  choice  between  functions  is  not  permitted.  The  additional  compre¬ 
hensive-choice  expression  nondeterministically  chooses  some  object  that  satisfies  a  predicate. 

The  Algorithmics  group  has  not  yet  decided  whether  to  allow  an  explicit  choice  operator  or  to 
allow  nondeterminism  indirectly  through  set  construction  only.  It  is  likely  that  a  choice  operator 
will  be  incorporated  in  the  language.  This  choice  operator  could  be  an  arbitrary- choice  operator 
as  in  CIP-L  or  an  operator  that  denotes  some  definite,  but  unspecified,  selection  operator. 

In  RAPTS,  nondeterminism  is  expressed  through  arbitrary  selection  from  a  set  and  arbitrary 
search  through  a  set.  This  avoids  restricting  the  order  in  which  the  solution  set  is  generated. 

The  advantage  of  avoiding  explicit  nondeterminism  is  a  simpler  semantics  of  the  object  language. 
However,  function  and  object  definitions  need  to  be  extended  beyond  equational  predicates  to 
general  set  membership  predicates.  Letting  the  choice  operator  denote  a  definite,  unspecified 
selection  operator  also  results  in  simpler  semantics,  but  certain  laws  (e.  g.,  commutativity)  may  no 
longer  be  valid.  We  believe  that  the  third  approach,  i.  e.,  allowing  an  arbitrary-choice  operator,  is 
more  appropriate,  because  one  can  develop  a  system  of  desirable  properties  of  the  choice  operator. 
Semantic  problems  can  be  avoided  by  restricting  the  choice  operator  to  a  choice  between  non¬ 
functional  values. 

5.9  Mathematical  Soundness 

The  CIP  group  made  a  big  effort  to  put  their  work  on  a  solid  mathematical  basis.  Part  of  the 
activities  in  the  CIP  project  was  the  development  of  transformational  semantics  for  CIP-L  and  a 
logical  calculus  of  program  transformations  for  CIP-S.  In  addition,  the  group  formally  developed 
the  existing  prototype  of  CIP-S  using  the  CIP  methodology. 

Mathematical  soundness  is  an  important  issue  in  the  work  of  the  Algorithmics  group.  Although 
certain  decisions  about  nondeterminism  and  semantics  have  not  yet  been  made,  they  are  considered 
to  be  very  important. 

The  theories  underlying  RAPTS,  i.  e.,  finite  differencing  and  fixed-point  theory  for  lattices, 
have  been  developed  formally.  As  far  as  the  system  itself  goes,  the  emphasis  is  more  on  pragmatic 
issues  than  on  formality. 

5.10  Conclusion 

It  is  difficult  to  compare  such  different  approaches  to  transformational  programming.  An  evaluation 
would  highly  depend  on  what  we  would  establish  as  the  goal  of  the  transformational  methodology. 
If  we  want  to  develop  a  mathematical  discipline  of  programming  independent  of  pragmatic  issues 
such  as  large  scale  programming,  the  direction  chosen  by  the  Algorithmics  group  seems  appropriate. 
If,  on  the  other  hand,  we  know  that  we  are  dealing  mainly  with  a  restricted  class  of  problems  that 
we  want  to  treat  in  a  pragmatic,  mechanized  way,  the  RAPTS  approach  is  well-suited.  Considering 
again  our  initial  motivation,  providing  a  transformational  component  of  a  general  prototyping 
system,  we  would  look  for  a  system  which  both  has  a  general  problem  domain  and  offers  extensive 
system  support  as  far  as  the  user  environment  is  concerned.  We  feel  that  the  CIP  group  is  facing  the 
right  direction  as  far  as  the  desgin  of  the  wide-spectrum  language  and  the  transformation  system 
goes.  However,  pragmatic  issues  should  be  given  more  weight;  the  system  should  provide  a  large 
library  of  frequently  used  standard  data  types,  a  comfortable  (implemented)  user  environment, 
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criterion 

CIP 

Algorithmics 

RAPTS 

objectives 

general  programming 

mathematical  framework 

super  compiler 

domain 

general 

functional 

hxpoint  expressions 

extensibility 

languages,  rules 

unrestricted 

rules 

transformations 

generative  set 

manual 

catalog 

rules 

input,  output,  condition 

general 

rewrite  rules 

mechanization 

semi-automatic 

none 

automatic 

system  support 

limited 

none 

extensive 

languages 

wide-spectrum,  schemes 

functional 

SETL  with  hxpoints 

types 

abstract,  algebraic 

algebraic 

sets  and  maps 

nondeterminism 

arbitrary  choice 

various 

selection  from  set 

Table  3:  Evaluation  of  Transformation  Systems  (Overview) 


and  a  library  of  automated  transformations  such  as  in  RAPTS  for  certain  well- understood  problem 
domains. 
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